Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available October 1, 2026
-
Pretraining language models on formal language can improve their acquisition of natural language. Which features of the formal language impart an inductive bias that leads to effective transfer? Drawing on insights from linguistics and complexity theory, we hypothesize that effective transfer occurs when two conditions are met: the formal language should capture the dependency structures present in natural language, and it should remain within the computational limitations of the model architecture. We experiment with pre-pretraining (training on formal language before natural languages) on transformers and find that formal languages capturing hierarchical dependencies indeed enable language models to achieve lower loss on natural language and better linguistic generalization compared to other formal languages. We also find modest support for the hypothesis that the formal language should fall within the computational limitations of the architecture. Strikingly, pre-pretraining reduces loss more efficiently than training on a matched amount of natural language. For a 1B-parameter language model trained on roughly 1.6B tokens of natural language, pre-pretraining achieves the same loss and better linguistic generalization with a 33% smaller token budget. Finally, we also give mechanistic evidence of transfer from formal to natural language: attention heads acquired during pre-pretraining remain crucial for the model's performance on syntactic evaluations.more » « less
-
Language model performance depends on identifying the optimal mixture of data groups to train on (e.g., law, code, math). Prior work has proposed a diverse set of methods to efficiently learn mixture proportions, ranging from fitting regression models over training runs to dynamically updating proportions throughout training. Surprisingly, we find that no existing method consistently outperforms a simple stratified sampling baseline in terms of average test perplexity. To understand this inconsistency, we unify existing methods into a standard framework, showing they are equivalent to solving a common optimization problem: minimize average loss subject to a method-specific mixing law -- an implicit assumption on the relationship between loss and mixture proportions. This framework suggests that measuring the fidelity of a method's mixing law can offer insights into its performance. Empirically, we find that existing methods set their mixing law parameters inaccurately, resulting in the inconsistent mixing performance we observe. Using this insight, we derive a new online method named Aioli, which directly estimates the mixing law parameters throughout training and uses them to dynamically adjust proportions. Aioli outperforms stratified sampling on 6 out of 6 datasets by an average of 0.27 test perplexity points, whereas existing methods fail to consistently beat stratified sampling, doing up to 6.9 points worse. Moreover, in a practical setting where proportions are learned on shorter runs due to computational constraints, Aioli can dynamically adjust these proportions over the full training run, consistently improving performance over existing methods by up to 12.012 test perplexity points.more » « less
-
Soares, Cláudio (Ed.)Abstract Extremophile organisms are known that can metabolize at temperatures down to − 25 °C (psychrophiles) and up to 122 °C (hyperthermophiles). Understanding viability under extreme conditions is relevant for human health, biotechnological applications, and our search for life elsewhere in the universe. Information about the stability and dynamics of proteins under environmental extremes is an important factor in this regard. Here we compare the dynamics of small Fe-S proteins – rubredoxins – from psychrophilic and hyperthermophilic microorganisms, using three different nuclear techniques as well as molecular dynamics calculations to quantify motion at the Fe site. The theory of ‘corresponding states’ posits that homologous proteins from different extremophiles have comparable flexibilities at the optimum growth temperatures of their respective organisms. Although ‘corresponding states’ would predict greater flexibility for rubredoxins that operate at low temperatures, we find that from 4 to 300 K, the dynamics of the Fe sites in these homologous proteins are essentially equivalent.more » « less
-
Abstract Isotopic fractionation has been linked to the lattice vibrations of materials through their phonon spectra. The Lamb-Mössbauer factor (fLM) has the potential to provide information about the lattice vibrations in materials. We constrain the temperature evolution of the fLM of γ- and ε-Fe at in situ high-P-T conditions between 1650 K and the melting point. We find that the vibrations of γ- and ε-Fe can be described using a quasiharmonic model with a pressure- and temperature-dependent Debye temperature computed from the measured fLM. From the Debye temperature, we derive the equilibrium isotopic fractionation β-factor of iron. Our results show that the quasiharmonic behavior of metallic iron would lower the value of lnβFe57/54 by 0.1‰ at 1600–2800 K and 50 GPa when compared to the extrapolation of room temperature nuclear resonant inelastic X-ray scattering data. Our study suggests that anharmonicity may be more prevalent in Fe metal than in lower mantle minerals at 2800 K and 50 GPa, a relevant condition for the core formation, and the silicate mantle may be isotopically heavy in iron.more » « less
An official website of the United States government

Full Text Available